OCR for Arabic using SIFT Descriptors With Online Failure Prediction

نویسندگان

  • Andrey Stolyarenko
  • Nachum Dershowitz
چکیده

Character recognition for Arabic texts poses a twofold challenge, segmenting words into letters and identifying the individual letters. We propose a method that combines the two tasks, using a grid of SIFT descriptors as features for classification of letters. Each word is scanned with increasing window sizes; segmentation points are set where the classifier achieves maximal confidence. Using the fact that Arabic has four types of letters, isolated, initial, middle and final, we are also able to predict if a word is correctly segmented. Performance of the algorithm applied to printed texts and computer fonts was evaluated on the PATS-A01 dataset. For fonts with non-overlapping letters, we achieve letter correctness of 87–96% and word correctness of 74–88%. For overlapping fonts, although the word correctness is low, only 14–23% are not predicted to be wrong. We suggest several approaches for improved performance, with and without exploiting failure prediction. Keywords-OCR, Optical, Character Recognition, Arabic, SIFT, Machine Learning, Predicting Correctness

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using SIFT Descriptors for OCR of Printed Arabic

Although optical character recognition of printed texts has been a focus of research for the last few decades, Arabic printed text, being cursive, still poses a challenge. The challenge is twofold: segmenting words into letters and identifying individual letters. We propose a method that combines the two tasks, using multiple grids of SIFT descriptors as features. To construct a classifier, we ...

متن کامل

Building Detection from Mobile Imagery Using Informative SIFT Descriptors

We propose reliable outdoor object detection on mobile phone imagery from off-the-shelf devices. With the goal to provide both robust object detection and reduction of computational complexity for situated interpretation of urban imagery, we propose to apply the ’Informative Descriptor Approach’ on SIFT features (i-SIFT descriptors). We learn an attentive matching of i-SIFT keypoints, resulting...

متن کامل

Word Spotting in Handwritten Arabic Documents Using Bag-Of-Descriptors

This paper presents a query-by-example word spotting in handwritten Arabic documents, based on Scale Invariant Feature Transform (SIFT), without using any text word or line segmentation approach, because any errors affect to the subsequent word representation. First the interest points are automatically extracted from the images using SIFT detector, then, we use SIFT descriptor to represent eac...

متن کامل

Robust Image Matching with Selected SIFT Descriptors

A robust image matching algorithm using a set of selected SIFT descriptors is investigated in this work. We first utilize the colorbased segmentation method and the watershed algorithm to separate foreground and background regions in images and then search the corresponding SIFT descriptors along foreground contours. These selected SIFT descriptors can offer more robust and stable image matchin...

متن کامل

Performance evaluation of block-based copy- move image forgery detection algorithms

Copy-move forgery is a particular type of distortion where a part or portions of one image is/are copied to other parts of the same image. This type of manipulation is done to hide a particular part of the image or to copy one or more objects into the same image. There are several methods for detecting copy-move forgery, including block-based and key point-based methods. In this paper, a method...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011